This map shows the GPS location density for a population of Grey Wolves in northeastern Alberta’s Athabasca Oil Sands Region from early 2012 to late 2014 and highlights the location data for one wolf of interest in blue. Clearly, the wolf of interest has unique movement patterns, but how do we gain insights about this wolf’s pattern of life? This tutorial demonstrates a method to extract reliable loiter information from large spatial-temporal GPS data sets.

Abstract

The rapid growth of spatial-temporal data is advancing analysts’ intelligence capabilities. With the abundance of available GPS data, analysts can study an individual’s pattern of life and inform resource allocation decisions. Visibility on when an individual stops, how long they stop, and when they are in-transit can aid in characterizing an area of interest, especially in areas when other information is sparse. Analysts currently use spatial density-based clustering techniques to glean insights from large GPS data sets, which has proved useful in characterizing areas of interest by identifying locations where this is a high number of observations occuring relatively close together. Loiter information, however, can only be analyzed by adding a time component to the clustering. This tutorial offers a method to employ Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) on both spatial and temporal information. We found that clustering with HDBSCAN spatially and temporally can overcome the known disadvantages of common density-based clustering methods and yield reliable loiter information to inform a pattern of life.

Introduction

The challenge to using spatial-temporal data is correctly labelling an observation as (substantially) moving or loitering (non-moving). The main assumption common to all loiter inference algorithms is that when an individual is loitering their observations will be relatively close geographically. In contrast, when an individual is moving, the distances between consecutive observations will be comparatively larger. Estimating this relative spatial difference is the key to classifying an observation as loitering or moving. Analysts employ well-established unsupervised learning clustering algorithms to detect possible loiter locations from GPS data such as time-based clustering (Hariharan and Toyama 2004) and (Liu, Wolfson, and Yin 2006), the centroid-based k-means clustering (Ashbrook and Starner 2002), and various density-based algorithms such as DBSCAN (Ester, Kriegel, and Xu, n.d.), ST-DBSCAN (Birant and Kut 2007), T-DBSCAN (Chen, Ji, and Wang 2014), and HDBSCAN (Campello et al. 2015).

Time-based clustering methods are used with data when movement triggers observations. This method determines possible loiter locations by finding observations within the defined spatial maximum distance and over a longer period than the defined minimum time interval (Hariharan and Toyama 2004) and (Liu, Wolfson, and Yin 2006). This method has been shown to succeed in finding clusters of tightly-placed observations with large inter-observation time intervals, typical of a loiter location. However, analysts are not always afforded data on every individual movement. Without such comprehensive coverage, time intervals without observations are no guarantee of lack of movement and other clustering methods should be used. If observations are more of a random sample of individual locations, loiter locations might be found by density of points alone. The k-means method divides the data into k regions clustered as best as possible around their centroids, which could be candidate loiter locations (Ashbrook and Starner 2002). Unfortunately, this model can generate inaccurate or spurious candidate locations if \(k\) is not correctly chosen, and determining the proper \(k\) can be an iterative, subjective process. Additionally, the time dimension - an integral part of the definition of loiter information - is completely discarded by k-means.

Density-based methods can identify clusters, including irregularly-shaped ones that don’t necessarily stick close to their centroids, without needing to specify how many clusters to search for. DBSCAN takes two parameters: a maximum spatial distance \(epsilon\) and the minimum number of observations required for a new cluster, \(minPts\). From these parameters, DBSCAN produces clusters of observations spaced within \(epsilon\) spatial distance (Birant and Kut 2007). ST-DBSCAN and T-DBSCAN further discriminate clusters based on temporal distance, which avoids identifying non-loiter clusters such as roads or paths (Chen, Ji, and Wang 2014) and (Birant and Kut 2007). The disadvantage is that these algorithms effectively learn what density of observations to expect from the most dense clusters in the data, then ignore potential clusters with lower density. Since individuals loiter in locations of varying size, and for varying time intervals, we expect that a characterization of all loiter locations in a GPS dataset would include clusters of varying density. HDBSCAN, which allows the DBSCAN algorithm to identify clusters of varying density, is more suitable for identifying loiter locations.

This tutorial demonstrates an implementation on the density-based clustering algorithm HDBSCAN, to infer loiter information about an individual of interest from a GPS dataset. We begin by conducting collection analysis to understand the limitations of the available data. Then we compute data-informed HDBSCAN parameters and cluster spatially. Next, we examine each spatial cluster and cluster again temporally to ensure that the cluster is better understood as a loiter location than a location frequently visited in transit. Finally, we process each cluster for loiter information and visualize our results on an interactive map.

Case Study Data

This tutorial uses movement data from grey wolves in northeastern Alberta’s Athabasca Oil Sands Region, available here. The primary use of this data is in support of the Arctic Animal Movement Archive. The data consists of a large Comma Separated Values (csv) file recording (non-comprehensively) the locations of 46 wolves between March 17, 2012 to September 13, 2014. Further information on how this telemetry data was collected is available here. We use this data as a placeholder for GPS data on persons of interest.

We begin by reading in and processing the wolf movement data using the R package data.table (Dowle et al. 2019). We use this package extensively in this tutorial, taking advantage of its unique methods for large data exploration and manipulation. Check here for a very insightful introduction to data.table.

The tabs below demonstrate the Extraction Transformation and Load (ETL) process implemented in this tutorial.

Extraction

The computation below leverages data.table’s fread function to read the csv and answer some preliminary exploratory questions concerning the data.

# Leverage fread function from package data.table to quickly read in csv data as an R object.
initData <- fread("data/ABoVE_ Boutin Alberta Grey Wolf.csv")

# How many wolves?
initData[# all rows
         ,
         # data.table fast unique count for the column 
         # labeling the different wolves
         uniqueN(`tag-local-identifier`)]
## [1] 43
# How big is the data set
utils::object.size(initData)
## 32539288 bytes
# What are the dimensions
dim(initData)
## [1] 239194     18
# View the first five rows 
head(initData,3) %>% formattable(align="l") %>%
  formattable::as.datatable(rownames=FALSE,
                            options=list(
                              searching = FALSE,
                              scrollX = TRUE,
                              columnDefs = list(list(className = 'dt-left', targets = '_all'))))

Transformation

The raw data set has 239,194 observations on 46 wolves. We only need the following columns for our study:

  • study-local-timestamp
  • tag-local-identifier
  • location-long
  • location-lat

The following computation extracts the data relevant to this study, and customizes the column labels for convenience.

# Load only the relevant columns
data <- fread("data/ABoVE_ Boutin Alberta Grey Wolf.csv",
             select=c("study-local-timestamp",
                     "tag-local-identifier",
                     "location-long",
                     "location-lat"),
            
             # Make sure data.table does not automatically generate factor columns
             stringsAsFactors = FALSE) %>% 
  
             # Omit NAs in the data. Familiarization with how the data was collected is 
             # necessary to consider retaining these values and making them useful
             na.omit()

# Set the column names for convenience
setnames(data, 
         
         # Vector of old names that we want to change
         old=c("study-local-timestamp",
                     "tag-local-identifier",
                     "location-long",
                     "location-lat"),
         
         # Vector of new more convenient names
         new=c("dtg", # date time group
               "cid", # component ID
               "lon", # longitude
               "lat")) # latitude

Load

This study focuses on making inferences about one individual, so we use one wolf as a surrogate. The computation below identifies a wolf of interest and subsets the data to only include this wolf’s observations.

# Use data.table indexing to determine to wolf with the most data
wolf.of.Interest <- data[,.(count=.N),by="cid"][count==max(count)]$cid # The additional [] gives us which cid has the maximum count

# Subset to focus on one wolf:
dt_init <- data[cid==wolf.of.Interest]
m <- nrow(dt_init )

# Create datetime objects from dtg character strings
dt_init [,"dtg" := dtg %>% as.POSIXct(format="%Y-%m-%d %H:%M:%S")]

# Order the data sequentially by date time group
setorder(dt_init,dtg)

# Set inter-obs time interval column
dt_init[,"timeInt" := dtg %>% difftime(shift(dtg),unit="secs")]
dt_init[,"timeInt" := ifelse(timeInt <= 24*3600,timeInt,NA)]

# Use lubridate package to get the weekday from the date objects
dt_init[,"Weekday" := dtg %>% lubridate::wday(label=TRUE)]

# Get the hour from the date objects
dt_init[,"Hour" := lubridate::hour(dtg)]

# Get the time of day for each dtg
dt_init[,"time" := as.ITime(dtg)]

# Set group time label (for plotting)
dt_init$group <- cut(dt_init$time,
                breaks=c(0,6*3600,9*3600,12*3600,
                         15*3600,18*3600,21*3600,24*3600),
                labels=c("0000 - 0600","0600 - 0900","0900 - 1200",
                         "1200 - 1500","1500 - 1800","1800 - 2100",
                         "2100 - 2400"))

save(dt_init,file="products/dt_init.RData")

Collection Analysis

Before implementing our clustering algorithm, we perform collection analysis to assess the feasibility of clustering and to measure the degree of any temporal collection bias. We use a novel statistical test to show that the clusters present in the wolf of interest’s data are different from the clusters seen in the wolf data as a whole. We also assess there is not significant evidence of weekday or time-of-day collection bias. We do however find there is significant collection bias over month and year. There are some time periods where the collection volume is very high and other periods where there is little to no collection. The tabs below demonstrate our collection analysis.

Volume by Weekday and Hour

We manipulate the data and generate a bivariate heatmap to study the the collection volume on every weekday, for each hour of the day.


Figure 1: Collection Volume by Hour and Weekday

This chart shows us that besides a low level of collection in the afternoon on Monday and early in the morning on Sunday, the collection is generally uniform across all weekdays and time of days. Further analysis is required to determine the statistical significance of how far this data departs from a bivariate uniform distribution. That analysis is omitted in this study, and we assume the data is generally free of hourly and weekday bias.

Volume by Date

Next we check the collection volume over the time range of the data to judge the collection bias with respect to each day of the study. The computation below generates a scatter plot that conveys how the collection volume has changed over time.


Figure 2: Collection Volume over Time

The interactive chart reveals three spikes in collection volume in April 2012, March 2013, and January to March 2014, with drastically lower observation counts on the other dates. Also, there are no observations between May 2012 and March 2013, which can drastically skew the interobservation time distribution. Unlike the relatively even distribution of collection for weekday and time of day, this chart conveys severe collection bias over the time range of the study, which means we cannot assume random collection with respect to year or season. Further analysis is required to investigate the cause of the collection bias. We continue our study, restrictings ourselves to identifying locations where the wolf of interest is likely loitering, but mindful that these locations may be year- or season-dependent.

Inter-observation Time

The wolf’s activity between his observations is unknown, which can skew our results. We need to study the inter-observation time between the observations of the wolf’s activity to judge how well the data represents the wolf’s locations.


Figure 3: Inter-observation Time Histogram

The histogram above shows that 95% of the inter-observation times are less than 15 minutes, which means that loiter locations with a dwell time of one hour may have as few as four observations in our data even if we have nominal coverage of the time period. This is typical of many GPS data sets analysts encounter. We can still make reasonable inferences, with the caveat that inferred loiter locations are only predictions based on the information available, and must be corroborated with other information sources.

Spatial-Temporal Clustering

We are now ready to begin extracting loiter information from the data. We implement HDBSCAN to first identify spatial clusters, then we cluster again temporally to find space-time clusters of interest. We then process these results and aggregate loiter information from our study, and convey the results in an interactive html map.

Optimal Parameter Values

The next step in our analysis before employing HDSCAN is to use data-informed parameter values for the spatial and temporal \(cluster\_selection\_epsilons\). These parameters are used in ST-DBSCAN as the maximum spatial and temporal distance allowed for two observations to be clustered together (Birant and Kut 2007). In HDBSCAN, these parameters are the minimum distance considered in evaluating cluster densities, and allow us to group together clusters within this distance (Campello et al. 2015). This parameter prevents the algorithm from incorrectly labeling cluster boundary points as outliers when the cluster has a relatively high density. More information about this parameter is available here. This is a reliable technique for mitigating the known disadvantages of HDBSCAN by combining the algorithm with DBSCAN, while still identifying clusters of varying densities.

We determine optimal parameter values by computing the distance to each observation’s nearest neighbor, sorting these distances, then finding the distance where the increase in nearest neighbor distance is the most dramatic [rahmah_determination_2016]. More details about this method are discussed here, and a tutorial for how to implement this method in Python is available here. We find where the change in the nearest neighbor distance is the most pronounced (also known as the “knee”) by finding the max curvature in the line plot (Satopaa et al. 2011) using the method described here and implemented in Python using the kneed module available here(Arvai 2020a).

Find kneedle function

We define the Python function below to calculate the max cuvature of a line plot. We use it by setting the x-axis as the order vector and the y-axis as the distances vector, which produces a convex increasing curve. The y-value for the knee in the curve is the optimal parameter value for \(cluster\_selection\_epsilon\).

from kneed import KneeLocator

def findKnee(order,distances,smoothParam,interpMethod='polynomial'):
    kneedle = KneeLocator(order,
                          distances,
                          S=smoothParam,
                          interp_method=interpMethod,
                          curve='convex',
                          direction='increasing',
                          online=True)
    return kneedle.knee_y

Epsilon Spatial Distance

As described here, we first calculate the spatial distance to the nearest neighbor for each observation and sort these values. We then find the maximum curvature, and plot the results.


Figure 4: Sorted Nearest Neighbor Spatial Distance (meters)

The plot above shows that ~200 meters is the distance just before the nearest neighbor distances rapidly increase. We store this value as epsSpat for use later.

Epsilon Temporal Distance

We now apply the same method to get the optimal temporal epsilon distance.


Figure 5: Sorted Nearest Neighbor Temporal Distance (seconds)

The plot above shows that the optimal temporal epsilon distance is 12 seconds. Even though the inter-observation times are mostly between 10 to 15 minutes, the differences in the time of day between observations are much closer. We store this value as epsTime for use later.

Application

Equipped with data informed parameters, we are ready to cluster spatially and temporally using HDBSCAN. We define the Python function below, which uses Python’s sklearn HDBSCAN algorithm (McInnes 2020). The function allows us to cluster using the haversine distance when we are clustering spatially and allows us to cluster temporally by inputting a pre-computed time distance matrix.

Python Function

# Imports
import numpy as np
import pandas as pd
import hdbscan

# Functions
def runHDB(points,minClusterSize,epsDist,minSampleSize=None,distMatrix=None,verbose=False):

    # Organize parameters
    if minSampleSize is None:
        minSampleSize = minClusterSize # restore default

    # Define cluster object
    if distMatrix is not None:
        clusterFinder = hdbscan.HDBSCAN(min_cluster_size=int(minClusterSize),
                                        min_samples=int(minSampleSize),
                                        metric="precomputed",
                                        cluster_selection_epsilon=epsDist,
                                        cluster_selection_method="eom")
        X = distMatrix

    else:
        clusterFinder = hdbscan.HDBSCAN(min_cluster_size=int(minClusterSize),
                                        min_samples=int(minSampleSize),
                                        metric="haversine",
                                        cluster_selection_epsilon=(epsDist/1000)/6371,
                                        cluster_selection_method="eom")
        X = np.radians(points)
    if verbose:
        print("Running HDBSCAN on {} observations".format(len(X)))

    res = clusterFinder.fit(X)
    y = res.labels_
    if verbose:
        print("Found {} clusters".format(pd.Series(y).max() + 1))

    return y + 1

Cluster Workflow

The computation below applies HDBSCAN with the data-informed parameters for \(cluster\_selection\_epsilon\) to find spatial clusters, then again on each spatial clusters to find temporal limits to those clusters. We use the minimum allowable cluster size of 2 to minimize how “smooth” the cluster density estimate is between nearby points, which mitigates HDBSCAN’s tendency to identify overlapping clusters (“Outlier Detection - Help to Understand the Right Parameter Configuration. · Issue #116 · Scikit-Learn-Contrib/Hdbscan. GitHub n.d.). We also use a high minimum sample size to discriminate noisy points from cluster boundary points. This method for using minimum cluster size and minimum sample size is discussed in greater detail here (“Outlier Detection - Help to Understand the Right Parameter Configuration. · Issue #116 · Scikit-Learn-Contrib/Hdbscan. GitHub n.d.).

# Cluster spatially
Y <- runHDB(points=dt[,c("lon","lat")],
            minClusterSize=2,
            minSampleSize=100,
            epsDist=epsSpat)
dt[,"spatClus" := Y]

# Cluster temporally
tempClus <- rep(0,m)
for (i in unique(dt[spatClus != 0, spatClus])) {
  
  Y <- runHDB(points=NULL,
              distMatrix=timeDistMat[dt$spatClus==i,dt$spatClus==i],
              minClusterSize=2,
              minSampleSize=100,
              epsDist=epsTime)
  
  # label
  tempClus[dt$spatClus==i] <- ifelse(Y!=0,paste0(i,".",Y),0)
 
}

# Set cluster column
dt[,"clus" := tempClus]
save(dt,file="products/dt.RData")

The table above shows the results of the initial clustering. We see that some clusters only include two observations, and some clusters include above 100 observations, which is a result of HDBSCAN identifying clusters of varying densities. After computing an initial clustering, we screen out clusters that do not include consecutive observations, as as they are not temporally convex (i.e. are made by combining two different times when there is evidence that the wolf was elsewhere in between). The following section describes our method to screen the clusters for evidence that the wolf loitered in the cluster location.

Screening

We begin screening by finding clusters with consecutive observations. We define the following R function to find the indices of the consecutive observations (values), and the lengths of each segment of consecutive observations (lengths), using a modified version of the R function available here (“Cran/Cgwtools. GitHub n.d.).

# Function to find consecutive integers, and the length of each consecutive instance
seqle <- function(x,incr=1) { 
  if(!is.numeric(x)) x <- as.numeric(x) 
  n <- length(x)  
  y <- x[-1L] != x[-n] + incr 
  i <- c(which(y|is.na(y)),n) 
  temp <- list(lengths = diff(c(0L,i)),
               values = x[head(c(0L,i)+1L,-1L)]) 
  return(list(lengths=temp$lengths[temp$lengths > 1] - 1,
              values=temp$values[temp$lengths > 1]))
} 

# Quick look
seqle(which(dt$clus==1.1)) %>% as.data.table() %>% formattable(align="l") %>% as.htmlwidget()

The table above is the result of our seqle function on one cluster. The values column indicates the index of the start of a consecutive segment of observations, and the corresponding lengths column indicates the number of consecutive observations that begin at the value index. We see that for cluster number 1.1 there are consecutive segments with over 1,000 observations.

We now apply the seqle function to each cluster, filter out the segments consisting of only one observation (no evidence of loitering), and calculate the loiter time for each consecutive segment in each cluster. We then filter out the clusters where the wolf visited less than 10 times, and filter out clusters where the wolf spent an average of less than 30 minutes. Full code available here.

Now that we have stored the results in our data and a list object, we can use this information to plot using leaflet. The functions available here provide a comprehensive method to convey the results of our analysis. We omit the functions in this tutorial for brevity.

Results

The map below conveys the results of clustering spatially and temporally using HDBSCAN. The loiter information for each cluster is shown when hovering over each cluster, and a radial heatmap of time of day by weekday is shown when clicking each cluster. We group the clusters by the time of day and include interactive toggles to allow the analyst to study the wolf’s pattern of life.

Conclusion

This tutorial demonstrates a method to extract actionable information from GPS data. A heatmap, or geohash density map can provide initial intuition about the subject’s most frequented locations, but more granular information is typically necessary. Viewing the plot above, an analyst can become overwhelmed while attempting to find meaningful conclusions from only viewing the observations. Once we conduct a spatial and temporal clustering, we can elicit reliable information about when, where, and for how long the subject (here, the wolf of interest) loiters.

Works Cited

Adolfsson, A., M. Ackerman, and N. C. Brownstein. 2019. “To Cluster, or Not to Cluster: An Analysis of Clusterability Methods.” Pattern Recognition 88 (April): 13–26. https://doi.org/10.1016/j.patcog.2018.10.026.
Appelhans, Tim, and Florian Detsch. 2019. “Leafpop: Include Tables, Images and Graphs in Leaflet Pop-Ups.” https://CRAN.R-project.org/package=leafpop.
Appelhans, Tim, Christoph Reudenbach, Kenton Russell, Jochen Darley, Daniel Montague (Leaflet EasyButton plugin), Lorenzo Busetto, Luigi Ranghetti, Miles McBain, and Sebastian Gatscha. 2020. “Leafem: ’Leaflet’ Extensions for ’Mapview’.” https://CRAN.R-project.org/package=leafem.
Arvai, Kevin. 2020a. Kneed: Knee-Point Detection in Python (version 0.6.0). https://github.com/arvkevi/kneed.
———. 2020b. Arvkevi/Kneed. https://github.com/arvkevi/kneed.
Ashbrook, D., and T. Starner. 2002. “Learning Significant Locations and Predicting User Movement with GPS.” In Proceedings. Sixth International Symposium on Wearable Computers, 101–8. Seattle, WA, USA: IEEE. https://doi.org/10.1109/ISWC.2002.1167224.
“Basic Usage of HDBSCAN* for Clustering — Hdbscan 0.8.1 Documentation.” n.d. Accessed April 30, 2020. https://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html.
Bengtsson, Henrik. 2019. “R.utils: Various Programming Utilities.” https://CRAN.R-project.org/package=R.utils.
Birant, Derya, and Alp Kut. 2007. ST-DBSCAN: An Algorithm for Clustering Spatial–Temporal Data.” Data & Knowledge Engineering, Intelligent data mining, 60 (1): 208–21. https://doi.org/10.1016/j.datak.2006.01.013.
Bivand, Roger, Tim Keitt, Barry Rowlingson, Edzer Pebesma, Michael Sumner, Robert Hijmans, Even Rouault, Frank Warmerdam, Jeroen Ooms, and Colin Rundel. 2019. “Rgdal: Bindings for the ’GeospatialData Abstraction Library.” https://CRAN.R-project.org/package=rgdal.
Bivand, Roger, Colin Rundel, Edzer Pebesma, Rainer Stuetz, Karl Ove Hufthammer, Patrick Giraudoux, Martin Davis, and Sandro Santilli. 2019. “Rgeos: Interface to Geometry Engine - Open Source (’GEOS’).” https://CRAN.R-project.org/package=rgeos.
Campello, Ricardo J. G. B., Davoud Moulavi, Arthur Zimek, and Jörg Sander. 2015. “Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection.” TKDD. https://doi.org/10.1145/2733381.
Chen, Wen, Minhe Ji, and Jianmei Wang. 2014. “T-DBSCAN: A Spatiotemporal Density Clustering for GPS Trajectory Segmentation.” Int. J. Onl. Eng. 10 (6): 19. https://doi.org/10.3991/ijoe.v10i6.3881.
Cheng, Joe, Bhaskar Karambelkar, Yihui Xie, Hadley Wickham, Kenton Russell, Kent Johnson, Barret Schloerke, et al. 2019. “Leaflet: Create Interactive Web Maps with the JavaScriptLeafletLibrary.” https://CRAN.R-project.org/package=leaflet.
Chirico, Michael, and Dmitry Shkolnik. 2019. geohashTools: Tools for Working with Geohashes.” https://CRAN.R-project.org/package=geohashTools.
“Combining HDBSCAN* with DBSCAN — Hdbscan 0.8.1 Documentation.” n.d. Accessed April 30, 2020. https://hdbscan.readthedocs.io/en/latest/how_to_use_epsilon.html.
“Cran/Cgwtools. GitHub.” n.d. Accessed April 30, 2020. https://github.com/cran/cgwtools.
Dowle, Matt, Arun Srinivasan, Jan Gorecki, Michael Chirico, Pasha Stetsenko, Tom Short, Steve Lianoglou, et al. 2019. “Data.table: Extension of ’Data.frame’.” https://CRAN.R-project.org/package=data.table.
Emerson, John W, and Michael J Kane. n.d. “The R Package Bigmemory: Supporting Efficient Computation and Concurrent Programming with Large Data Sets.” Journal of Statistical Software, 16.
Ester, Martin, Hans-Peter Kriegel, and Xiaowei Xu. n.d. “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” 6.
Han, Jiawei, Micheline Kamber, and Jian Pei, eds. 2012. “Front Matter.” In Data Mining (Third Edition), i–v. The Morgan Kaufmann Series in Data Management Systems. Boston: Morgan Kaufmann. https://doi.org/10.1016/B978-0-12-381479-1.00016-2.
Hariharan, Ramaswamy, and Kentaro Toyama. 2004. “Project Lachesis: Parsing and Modeling Location Histories.” In Geographic Information Science, edited by Max J. Egenhofer, Christian Freksa, and Harvey J. Miller, 3234:106–24. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-30231-5_8.
Hijmans, Robert J., Ed Williams, and Chris Vennes. 2019. “Geosphere: Spherical Trigonometry.” https://CRAN.R-project.org/package=geosphere.
Hopkins, Brian, and J. G. Skellam. 1954. “A New Method for Determining the Type of Distribution of Plant Individuals.” Annals of Botany 18 (70): 213–27. https://www.jstor.org/stable/42907238.
Karambelkar, Bhaskar, Barret Schloerke, Bangyou Zheng (Leaflet-search and Leaflet-GPS plugin integration), Robin Cura (Fixes for Draw Options), Markus Voge (Enhancements for Draw Options), Markus Dumke (Bounce Marker addition), Mapbox (leaflet-omnivore, et al. 2018. “Leaflet.extras: Extra Functionality for ’Leaflet’ Package.” https://CRAN.R-project.org/package=leaflet.extras.
Lawson, Richard G., and Peter C. Jurs. 1990. “New Index for Clustering Tendency and Its Application to Chemical Problems.” J. Chem. Inf. Comput. Sci. 30 (1): 36–41. https://doi.org/10.1021/ci00065a010.
Liu, Juhong, Ouri Wolfson, and Huabei Yin. 2006. “Extracting Semantic Location from Outdoor Positioning Systems.” In Proceedings of the 7th International Conference on Mobile Data Management (MDM’06). IEEE Computer Society.
Maklin, Cory. 2019. DBSCAN Python Example: The Optimal Value for Epsilon (EPS). Medium.” July 14, 2019. https://towardsdatascience.com/machine-learning-clustering-dbscan-determine-the-optimal-value-for-epsilon-eps-python-example-3100091cfbc.
McInnes, Leland. 2020. Hdbscan: Clustering Based on Density with Variable Density Clusters (version 0.8.26). http://github.com/scikit-learn-contrib/hdbscan.
“Movebank.” n.d. Accessed April 30, 2020. https://www.movebank.org/cms/webapp?gwt_fragment=page=studies,path=study492444603.
NumPyNumPy.” 2020. May 4, 2020. https://numpy.org/.
“Outlier Detection - Help to Understand the Right Parameter Configuration. · Issue #116 · Scikit-Learn-Contrib/Hdbscan. GitHub.” n.d. Accessed April 30, 2020. https://github.com/scikit-learn-contrib/hdbscan/issues/116.
“Pandas - Python Data Analysis Library.” 2020. March 18, 2020. https://pandas.pydata.org/.
Pebesma, Edzer, Roger Bivand, Barry Rowlingson, Virgilio Gomez-Rubio, Robert Hijmans, Michael Sumner, Don MacQueen, Jim Lemon, Josh O’Brien, and Joseph O’Rourke. 2020. “Sp: Classes and Methods for Spatial Data.” https://CRAN.R-project.org/package=sp.
Rahmah, Nadia, and Imas Sukaesih Sitanggang. 2016. “Determination of Optimal Epsilon (Eps) Value on DBSCAN Algorithm to Clustering Data on Peatland Hotspots in Sumatra.” IOP Conf. Ser.: Earth Environ. Sci. 31 (January): 012012. https://doi.org/10.1088/1755-1315/31/1/012012.
Ren, Kun, and Kenton Russell. 2016. “Formattable: CreateFormattableData Structures.” https://CRAN.R-project.org/package=formattable.
RStudio, and Inc. 2019. “Htmltools: Tools for HTML.” https://CRAN.R-project.org/package=htmltools.
Satopaa, Ville, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. “Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior.” In 2011 31st International Conference on Distributed Computing Systems Workshops, 166–71. Minneapolis, MN, USA: IEEE. https://doi.org/10.1109/ICDCSW.2011.20.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, Pedro Despouy, and Plotly Technologies Inc. 2020. “Plotly: Create Interactive Web Graphics via ’Plotly.js’.” https://CRAN.R-project.org/package=plotly.
Spinu, Vitalie, Garrett Grolemund, Hadley Wickham, Ian Lyttle, Imanuel Constigan, Jason Law, Doug Mitarotonda, Joseph Larmarange, Jonathan Boiser, and Chel Hee Lee. 2020. “Lubridate: Make Dealing with Dates a Little Easier.” https://CRAN.R-project.org/package=lubridate.
Ushey, Kevin, J. J. Allaire, RStudio, Yuan Tang [aut, cph, Dirk Eddelbuettel, Bryan Lewis, et al. 2020. “Reticulate: Interface to ’Python’.” https://CRAN.R-project.org/package=reticulate.
Vaidyanathan, Ramnath, Yihui Xie, J. J. Allaire, Joe Cheng, Kenton Russell, and RStudio. 2019. “Htmlwidgets: HTML Widgets for R.” https://CRAN.R-project.org/package=htmlwidgets.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and RStudio. 2020. “Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics.” https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and RStudio. 2020. “Dplyr: A Grammar of Data Manipulation.” https://CRAN.R-project.org/package=dplyr.
Xie, Yihui, Joe Cheng, Xianying Tan, J. J. Allaire, Maximilian Girlich, Greg Freedman Ellis, Johannes Rauh, et al. 2020. DT: A Wrapper of the JavaScript LibraryDataTables’.” https://CRAN.R-project.org/package=DT.